I approached the SuperTuxKart community fearing some backslash due to last week s discussion about their release 0.9, to find instead a nice, friendly and welcoming community. I have already had some very nice talks with them since then, and they have patiently explained to me the sequence of events that led to the situation that I mentioned and that, for the sake of fairness, I consider that I have to share here too. You can read the log of the first conversation I had with them (the log has been edited and cleared up for clarity and readability). I seriously recommend reading it, it s a honest friendly conversation, and it s first hand.
For those who don t already know the game:
All this story seems to start with the complain of a 6 yo girl, close relative of one of the developers and STK user, who explained that she always felt that Mario Kart was better because there was a princess in it. I m not particularly happy with princesses as role models for girls, but one thing I have always said is that we have to listen to kids and take their opinions into accounts, and I know that if I had such a request from one of the kids closer to me, I probably would have fulfilled it too. In any case, Free Software projects based on volunteer work are essentially a do-ocracy and it is assumed that whoever does the work, gets to decide about it.
So that is how Princess Sara was added to the game. While developing it, I was assured that they took extra care that her proportions were somehow realistic, and not as distorted as we re used to see in Barbie or many Disney films. Sara is inspired on an OpenGameArt s wizard and is not supposed to be a weak damsel in distress, but in fact a powerful character in the world s universe.
Sara is not the only female character playable. There are a few others: Suzanne (a monkey, Blender s mascot), Xue (XFCE s mouse) and Amanda (a panda, the mascot of windows maker). Sara happens to be the only human character playable, male or female. While it has been argued that by adding that character, a player might have the impression that the rest of the characters would be male by default, I have been told that the intention is exactly the opposite,and that the fact that the only human playable character in the game is female should make it more attractive to girls. To some, at least.
Here are some images of Sara:
So the fact is that they have invested a lot of time in developing Sara s model. I m not an artist myself, so I don t know first hand how much time and effort it takes to make such a model, but in any case it seems that quite a lot. When they designed the beach track Gran Paradiso, they wanted to add people to the beach. That track is, in fact, inspired on a real existing place: Princess Juliana Airport. Time was over and they wanted to publish a version with what they already had, so they used Sara s model in a bikini on the beach, with the intention of adding more people, male and female, later. The overall view of the beach would be:
This is how that track shows when the players are driving in it:
Now, about the poster of version 0.9, it is supposed to be inspired in the previous poster of version 0.8.1, only this time inspired in Carnival (which is, in fact, a celebration in which sexualization of both genders is a core part). I know that there are accusations of cultural apropriation, but I couldn t know, as my white privilege probably shields me from seeing that. Up to now, no one has said anything about that, only Gunnar explaining his point of view as a non-native mexican: While the poster does not strike as the most cautious possible, I do not see it as culturally offensive. It does not attempt to set a scene portraiting what were the cultures really like; the portrait it paints is similar to so many fantasy recreations . In my opinion, even when the model is done in good taste, with no superbig breasts and no unrealistic waist, it s still depicting a girl without much clothes as the main element of the scene, with an attire, a posture and an attitude that clearly resembles carnival and, thus, inevitably conveys a message of sexualization. Even though I can t deny that it s a cute poster, it s one I wouldn t be happy to see for example in a school, if someone wanted to promote the game there.
The author of the poster, anyway, tells me that he had a totally different intention when doing it, and he wanted to depict a powerful princess, in the center of SuperTuxKart s universe, celebrating the new engine.
About the panties showing every now and then, I ve been told that it s something so hard to see that in fact you would really have to open the model itself to view them. I m not saying that I like them though, I think it would have been better if Sara would have had short pants under the skirt, if she was going to drive the snowmobile with a dress, but I m not sure if that s something important enough to condemn the game. The original girl mentioned at the beginning of this post seems to have found the animation funny, started laughing, and said that Sara is very silly, and that was all. It s probably something more silly than naughty, I guess. Even though, as I said, it s something I don t like too much. I don t have to agree with STK developers in everything. I guess.
There s one thing I would like to highlight about my conversations with the developers of SuperTuxKart, though. I like them. They seem to be as concerned about the wellbeing of kids as I am, they have their own ethic norms of what s acceptable and what s not, and they want to do something to be proud of. Sometimes, many of these conflicts arise from a lack of trust. When I first saw the screenshots with the girl in bikini and the panties showing, I was honestly concerned about the direction the project was taking. After having talked with the developers, I am more calmed about it, because they seem to have their heart in the right place, they care, they are motivated and they work hard. I don t know if a princess would be my first choice for a main female character, but at least their intention seems to be to give some girls a sensible role model in the game with who they can identify.
It has been recently discussed in Debian-Women and Debian-Games mailing lists, but for all of you who don t read those mailing lists and might have kids or use free games with kids in the classroom, or stuff like that, I thought it might be good to talk about it here.
SuperTuxKart is a free 3D kart racing game, similar to Mario Kart, with a focus on having fun over realism. The characters in the game are the mascots of free and open source projects, except for Nolok, who does not represent a particular open source project, but was created by the SuperTux Game Team as the enemy of Tux.
On April 21, 2015, version 0.9 (not yet in Debian) was released which used the Antarctica graphics engine (a derivative of Irrlicht) and enabled better graphics appearance and features such as dynamic lighting, ambient occlusion, depth of field, and global illumination.
Along with this new engine comes a poster with a sexualized white woman is wearing an outfit that can be depicted as a mix of Native american clothes from different nation and a halo of feathers, as well as many models of her in a bikini swim suit, all along the game, even in the hall of the airport.
They say an image is worth more than a thousand words, don t they?
Today, Aaron Shaw and I are pleased to announce a new startup. The startup is based around an app we are building called RomancR that will bring the sharing economy directly into your bedrooms and romantic lives.
When launched, RomancR will bring the kind of market-driven convenience and efficiency that Uber has brought to ride sharing, and that AirBnB has brought to room sharing, directly into the most frustrating and inefficient domain of our personal lives. RomancR is Uber for romance and sex.
Here s how it will work:
Users will view profiles of nearby RomancR users that match any number of user-specified criteria for romantic matches (e.g., sexual orientation, gender, age, etc).
When a user finds a nearby match who they are interested in meeting, they can send a request to meet in person. If they choose, users initiating these requests can attach an optional monetary donation to their request.
When a user receives a request, they can accept or reject the request with a simple swipe to the left or right. Of course, they can take the donation offer into account when making this decision or counter-offer with a request for a higher donation. Larger donations will increase the likelihood of an affirmative answer.
If a user agrees to meet in person, and if the couple then subsequently spends the night together RomancR will measure this automatically by ensuring that the geolocation of both users phones match the same physical space for at least 8 hours the donation will be transferred from the requester to the user who responded affirmatively.
Users will be able to rate each other in ways that are similar to other sharing economy platforms.
Of course, there are many existing applications like Tinder and Grindr that help facilitate romance, dating, and hookups. Unfortunately, each of these still relies on old-fashion intrinsic ways of motivating people to participate in romantic endeavors. The sharing economy has shown us that systems that rely on these non-monetary motivations are ineffective and limiting! For example, many altruistic and socially-driven ride-sharing systems existed on platforms like Craigslist or Ridejoy before Uber. Similarly, volunteer-based communities like Couchsurfing and Hospitality Club existed for many years before AirBnB. None of those older systems took off in the way that their sharing economy counterparts were able to!
The reason that Uber and AirBnB exploded where previous efforts stalled is that this new generation of sharing economy startups brings the power of markets to bear on the problems they are trying to solve. Money both encourages more people to participate in providing a service and also makes it socially easier for people to take that service up without feeling like they are socially in debt to the person providing the service for free. The result has been more reliable and effective systems for proving rides and rooms! The reason that the sharing economy works, fundamentally, is that it has nothing to do with sharing at all! Systems that rely on people s social desire to share without money projects like Couchsurfing are relics of the previous century.
RomancR, which we plan to launch later this year, will bring the power and efficiency of markets to our romantic lives. You will leave your pitiful dating life where it belongs in the dustbin of history! Go beyond antiquated non-market systems for finding lovers. Why should we rely on people s fickle sense of taste and attractiveness, their complicated ideas of interpersonal compatibility, or their sense of altruism, when we can rely on the power of prices? With RomancR, we won t have to!
Note: Thanks to Yochai Benkler whose example of how leaving a $100 bill on the bedside table of a person with whom you spent the night can change the nature of the a romantic interaction inspired the idea for this startup.
Today, Aaron Shaw and I are pleased to announce a new startup. The startup is based around an app we are building called RomancR that will bring the sharing economy directly into your bedrooms and romantic lives.
When launched, RomancR will bring the kind of market-driven convenience and efficiency that Uber has brought to ride sharing, and that AirBnB has brought to room sharing, directly into the most frustrating and inefficient domain of our personal lives. RomancR is Uber for romance and sex.
Here s how it will work:
Users will view profiles of nearby RomancR users that match any number of user-specified criteria for romantic matches (e.g., sexual orientation, gender, age, etc).
When a user finds a nearby match who they are interested in meeting, they can send a request to meet in person. If they choose, users initiating these requests can attach an optional monetary donation to their request.
When a user receives a request, they can accept or reject the request with a simple swipe to the left or right. Of course, they can take the donation offer into account when making this decision or counter-offer with a request for a higher donation. Larger donations will increase the likelihood of an affirmative answer.
If a user agrees to meet in person, and if the couple then subsequently spends the night together RomancR will measure this automatically by ensuring that the geolocation of both users phones match the same physical space for at least 8 hours the donation will be transferred from the requester to the user who responded affirmatively.
Users will be able to rate each other in ways that are similar to other sharing economy platforms.
Of course, there are many existing applications like Tinder and Grindr that help facilitate romance, dating, and hookups. Unfortunately, each of these still relies on old-fashion intrinsic ways of motivating people to participate in romantic endeavors. The sharing economy has shown us that systems that rely on these non-monetary motivations are ineffective and limiting! For example, many altruistic and socially-driven ride-sharing systems existed on platforms like Craigslist or Ridejoy before Uber. Similarly, volunteer-based communities like Couchsurfing and Hospitality Club existed for many years before AirBnB. None of those older systems took off in the way that their sharing economy counterparts were able to!
The reason that Uber and AirBnB exploded where previous efforts stalled is that this new generation of sharing economy startups brings the power of markets to bear on the problems they are trying to solve. Money both encourages more people to participate in providing a service and also makes it socially easier for people to take that service up without feeling like they are socially in debt to the person providing the service for free. The result has been more reliable and effective systems for proving rides and rooms! The reason that the sharing economy works, fundamentally, is that it has nothing to do with sharing at all! Systems that rely on people s social desire to share without money projects like Couchsurfing are relics of the previous century.
RomancR, which we plan to launch later this year, will bring the power and efficiency of markets to our romantic lives. You will leave your pitiful dating life where it belongs in the dustbin of history! Go beyond antiquated non-market systems for finding lovers. Why should we rely on people s fickle sense of taste and attractiveness, their complicated ideas of interpersonal compatibility, or their sense of altruism, when we can rely on the power of prices? With RomancR, we won t have to!
Note: Thanks to Yochai Benkler whose example of how leaving a $100 bill on the bedside table of a person with whom you spent the night can change the nature of the a romantic interaction inspired the idea for this startup.
Review: The Ring of Charon, by Roger MacBride Allen
Series:
Hunted Earth #1
Publisher:
Tor
Copyright:
December 1990
ISBN:
0-8125-3014-4
Format:
Mass market
Pages:
500
Larry Chao is a junior scientist at a gravity research on Pluto, at the
very outer limits of human reach in the solar system. The facility is for
researching artificial gravity, which is one reason it's in the middle of
nowhere. Another is that their experimental generator is built in a ring
around Charon, and the close proximity of the two bodies is useful for
scientific observation. Unfortunately, there hasn't been much to observe.
They can create very short-lived gravity fields in very small areas, but
nothing like the artificial gravity that was the original promise of the
facility.
As a result, the government is shutting the facility down. The
authoritarian director, Simon Raphael, is... not exactly happy with that
decision, but resigned to it and running the facility to completion with a
sullen anger. When Larry makes a startling breakthrough at nearly the
last minute for the society, Simon is uninterested and hostile. This
leads Larry and his fellow scientist Sondra Berghoff to attempt a more
radical demonstration of Larry's success and prove to the rest of the
solar system that the facility should be kept open. That decision has a
far deeper impact on humanity and the solar system than they could have
possibly imagined.
The Ring of Charon and its sequel, Shattered Sphere, were
recommended to me as good harder science fiction. It took me a while to
track down copies in fact, it took an in-person trip to
Powell's. Once I found them, a relatively
straightforward, old-school science fiction novel seemed like just the
thing to read during my commute.
Allen delivers there. I'm not spoiling the main plot driver of the book
even though it's given away on the back cover, since it's some time in
coming in the novel. But The Ring of Charon turns into a
multi-viewpoint cross between a disaster novel and a scientific
investigation. Larry and Sondra stay central to the plot, of course, but
Allen adds a variety of other characters who are attempting to figure out
what happened to the solar system and then deal with the consequences:
everyone from scientists to pilots to communications officers in weird
surrealistic stations.
The science is mostly believable, apart from the scientific breakthrough
that motivates the plot. Characterization isn't absent completely, but
it's simple and unsubtle; dialogue is a bit wooden, characters aren't
entirely multidimensional, but Allen does a reasonably good job with both
pacing and the sense of mystery and investigation, and a surprisingly good
job portraying organizational politics.
As you might guess from the tone of my review, this is not the book to
reach for if you want something ground-breaking. It's a very
conventional, multi-viewpoint SF novel full of scientists and
investigation of unknown and possibly hostile phenomena. If you've read
much science fiction, you've read books like this before. But one thing
that Allen does surprisingly well, which makes The Ring of Charon
stand a bit above the pack, is that he doesn't write villains. Even
Simon, who goes out of his way to make the reader hate him at the start of
the book, becomes a surprisingly sympathetic character. The characters
who are usually villains or at least foils in books like this the smooth
PR person, the religious man, the blindly-focused scientist who isn't
interested in anyone else's theories never turn into caricatures, play
important roles in the plot, and turn out to be competent in their fields
of expertise.
There are actual villains, sort of, but I found myself feeling sympathetic
even towards them, at least in places. Allen takes a rather old SF dodge
to achieve conflict without an evil enemy, and, because of that, the end
of the book felt like a bit of an anticlimax. But I did like the feel of
the book where there isn't a good versus evil fight, just a wide variety
of people (and others) trying to understand and control the universe in
the best ways they know how.
I'm not sure I can quite recommend this book. The quality of the writing
is not particularly high, and I'm not generally a fan of the disaster
novel style of storytelling. But despite not being very original, there's
just something likable about it. It moves along reasonably well for a 500
page book, and it's refreshingly light on irritating stereotypes. I think
one has to be in the right mood when reading it and set expectations
accordingly, but it fit what I was looking for when I picked it up.
One warning, though: although The Ring of Charon reaches a climax,
the major plot conflict is not resolved at the end of this book, so you
may want to have the sequel around.
Followed by Shattered Sphere.
Rating: 6 out of 10
In wikis, redirects are special pages that silently take readers from the page they are visiting to another page. Although their presence is noted in tiny gray text (see the image below) most people use them all the time and never know they exist. Redirects exist to make linking between pages easier, they populate Wikipedia s search autocomplete list, and are generally helpful in organizing information. In the English Wikipedia, redirects make up more than half of all article pages.
Over the years, I ve spent some time contributing to to Redirects for Discussion (RfD). I think of RfD as like an ultra-low stakes version of Articles for Deletion where Wikipedians decide whether to delete or keep articles. If a redirect is deleted, viewers are taken to a search results page and almost nobody notices. That said, because redirects are almost never viewed directly, almost nobody notices if a redirect is kept either!
I ve told people that if they want to understand the soul of a Wikipedian, they should spend time participating in RfD. When you understand why arguing about and working hard to come to consensus solutions for how Wikipedia should handle individual redirects is an enjoyable way to spend your spare time where any outcome is invisible you understand what it means to be a Wikipedian.
That said, wiki researchers rarely take redirects into account. For years, I ve suspected that accounting for redirects was important for Wikipedia research and that several classes of findings were noisy or misleading because most people haven t done so. As a result, I worked with my colleague Aaron Shaw at Northwestern earlier this year to build a longitudinal dataset of redirects that can capture the dynamic nature of redirects. Our work was published as a short paper at OpenSym several months ago.
It turns out, taking redirects into account correctly (especially if you are looking at activity over time) is tricky because redirects are stored as normal pages by MediaWiki except that they happen to start with special redirect text. Like other pages, redirects can be updated and changed over time are frequently are. As a result, taking redirects into account for any study that looks at activity over time requires looking at the text of every revision of every page.
Using our dataset, Aaron and I showed that the distribution of edits across pages in English Wikipedia (a relationships that is used in many research projects) looks pretty close to log normal when we remove redirects and very different when you don t. After all, half of articles are really just redirects and, and because they are just redirects, these articles are almost never edited.
Another puzzling finding that s been reported in a few places and that I repeated myself several times is that edits and views are surprisingly uncorrelated. I ll write more about this later but the short version is that we found that a big chunk of this can, in fact, be explained by considering redirects.
We ve published our code and data and the article itself is online because we paid the ACM s open access fee to ransom the article.
Following the lead of my dear friend Daniel and his fantastic and addictive Summing up series, here s a link pack of recent stuff I read around the web.
Link pack is definitely a terrible name, but I m working on it.
How to Silence Negative Thinking
On how to avoid the pitfall of being a Negatron and not an Optimist Prime. You might be your own worst enemy and you might not even know it:
Psychologists use the term automatic negative thoughts to describe the ideas that pop into our heads uninvited, like burglars, and leave behind a mess of uncomfortable emotions. In the 1960s, one of the founders of cognitive therapy, Aaron Beck, concluded that ANTs sabotage our best self, and lead to a vicious circle of misery: creating a general mindset that is variously unhappy or anxious or angry (take your pick) and which is (therefore) all the more likely to generate new ANTs. We get stuck in the same old neural pathways, having the same negative thoughts again and again.
Meet Harlem s Official Street Photographer A man goes around Harlem with his camera, looking to give instead of taking. Makes you think about your approach to people and photography, things can be simpler. Kinda like Humans of New York, but in Harlem. And grittier, and on film but as touching, or more:
I tell people that my camera is a healing mechanism, Allah says. Let me photograph it and take it away from you.
What Happens When We Let Industry and Government Collect All the Data They Want
Why having nothing to hide is not about the now, but about the later. It s not that someone is going to judge for pushing every detail of your life to Twitter and Instagram, it s just that something you do might be illegal a few years later:
There was a time when it was essentially illegal to be gay. There was a time when it was legal to own people and illegal for them to run away. Sometimes, society gets it wrong. And it s not just nameless bureaucrats; it s men like Thomas Jefferson. When that happens, strong privacy protections including collection controls that let people pick who gets their data, and when allow the persecuted and unpopular to survive.
Hi,
geoip version 1.6.2-2 and geoip-database version 20141027-1 are now available in Debian unstable/sid, with some news of more free databases available :)
geoip changes:
* Add patch for geoip-csv-to-dat to add support for building GeoIP city DB.
Many thanks to Andrew Moise for contributing!
* Add and install geoip-generator-asn, which is able to build the ASN DB. It
is a modified version from the original geoip-generator. Much thanks for
contributing also to Aaron Gibson!
* Bump Standards-Version to 3.9.6 (no changes required).
geoip-database changes:
* New upstream release.
* Add new databases GeoLite city and GeoLite ASN to the new package
geoip-database-extra. Also bump build depends on geoip to 1.6.2-2.
* Switch to xz compression for the orig tarball.
I got word via the Electronic Frontier Foundation about an act of injustice happening to a person for doing... Not only what I do day to day, but what I promote and believe to be right: Sharing academic articles.
Diego is a Colombian, working towards his Masters degree on conservation and biodiversity in Costa Rica. He is now facing up to eight years imprisonment for... Sharing a scholarly article he did not author on Scribd.
Many people lack the knowledge and skills to properly set up a venue to share their articles with people they know. Many people will hope for the best and expect academic publishers to be fundamentally good, not to send legal threats just for the simple, noncommercial act of sharing knowledge. Sharing knowledge is fundamental for science to grow, for knowledge to rise. Besides, most scholarly studies are funded by public money, and as the saying goes, they should benefit the public. And the public is everybody, is all of us.
And yes, if this sounds in any way like what drove Aaron Swartz to his sad suicide early this year... It is exactly the same thing. Thankfully (although, sadly, after the sad fact), thousands of people strongly stood on Aaron's side on that demand. Please sign the EFF petition to help Diego, share this, and try to spread the word on the real world needs for Open Access mandates for academics!
Some links with further information:
I haven't made one of these in a long time, so I have some catching from
random purchases to do, which includes a (repurposed) nice parting gift
from my previous employer and a trip to Powell's since I was in the area
for DebConf14. This also includes the contents of the Hugo voter's
packet, which contained a wide variety of random stuff even if some of the
novels were represented only by excerpts.
John Joseph Adams (ed.) The Mad Scientist's Guide to World
Domination (sff anthology)
Roger McBride Allen The Ring of Charon (sff)
Roger McBride Allen The Shattered Sphere (sff)
Iain M. Banks The Hydrogen Sonata (sff)
Julian Barnes The Sense of an Ending (mainstream)
M. David Blake (ed.) 2014 Campbellian Anthology (sff anthology)
Algis Budrys Benchmarks Continued (non-fiction)
Algis Budrys Benchmarks Revisited (non-fiction)
Algis Budrys Benchmarks Concluded (non-fiction)
Edgar Rice Burroughs Carson of Venus (sff)
Wesley Chu The Lives of Tao (sff)
Ernest Cline Ready Player One (sff)
Larry Correia Hard Magic (sff)
Larry Correia Spellbound (sff)
Larry Correia Warbound (sff)
Sigrid Ellis & Michael Damien Thomas (ed.) Queer Chicks Dig Time
Lords (non-fiction)
Neil Gaiman The Ocean at the End of the Lane (sff)
Max Gladstone Three Parts Dead (sff)
Max Gladstone Two Serpents Rise (sff)
S.L. Huang Zero Sum Game (sff)
Robert Jordan & Brandon Sanderson The Wheel of Time (sff)
Drew Karpyshyn Mass Effect: Revelation (sff)
Justin Landon & Jared Shurin (ed.) Speculative Fiction 2012
(non-fiction)
John J. Lumpkin Through Struggle, the Stars (sff)
L. David Marquet Turn the Ship Around! (non-fiction)
George R.R. Martin & Raya Golden Meathouse Man (graphic novel)
Ramez Naam Nexus (sff)
Eiichiro Oda One Piece Volume 1 (manga)
Eiichiro Oda One Piece Volume 2 (manga)
Eiichiro Oda One Piece Volume 3 (manga)
Eiichiro Oda One Piece Volume 4 (manga)
Alexei Panshin New Celebrations (sff)
K.J. Parker Devices and Desires (sff)
K.J. Parker Evil for Evil (sff)
Sofia Samatar A Stranger in Olondria (sff)
John Scalzi The Human Division (sff)
Jonathan Straham (ed.) Fearsome Journeys (sff anthology)
Vernor Vinge The Children of the Sky (sff)
Brian Wood & Becky Cloonan Demo (graphic novel)
Charles Yu How to Live Safely in a Science Fictional Universe (sff)
A whole bunch of this is from the Hugo voter's packet, and since the Hugos
are over, much of that probably won't get prioritized. (I was very happy
with the results of the voting, though.)
Other than that, it's a very random collection of stuff, including a few
things that I picked up based on James Nicoll's reviews. Now that I have
a daily train commute, I should pick up the pace of reading, and as long
as I can find enough time in my schedule to also write reviews, hopefully
there will be more content in this blog shortly.
A little more than 2 years ago, the
Ceilometer project was launched inside
the OpenStack ecosystem. Its main objective was to measure OpenStack cloud
platforms in order to provide data and mechanisms for functionalities such
as billing, alarming or capacity planning.
In this article, I would like to relate what I've been doing with other
Ceilometer developers in the last 5 months. I've lowered my involvement in
Ceilometer itself directly to concentrate on solving one of its biggest
issue at the source, and I think it's largely time to take a break and talk
about it.
Ceilometer early design
For the last years, Ceilometer didn't change in its core architecture.
Without diving too much in all its parts, one of the early design decision
was to build the metering around a data structure we called samples. A
sample is generated each time Ceilometer measures something. It is composed
of a few fields, such as the the resource id that is metered, the user and
project id owning that resources, the meter name, the measured value, a
timestamp and a few free-form metadata. Each time Ceilometer measures
something, one of its components (an agent, a pollster ) constructs and
emits a sample headed for the storage component that we call the
collector.
This collector is responsible for storing the samples into a database. The
Ceilometer collector uses a pluggable storage system, meaning that you can
pick any database system you prefer. Our original implementation has been
based on MongoDB from the beginning, but we then added a SQL driver, and
people contributed things such as HBase or DB2 support.
The REST API exposed by Ceilometer allows to execute various reading
requests on this data store. It can returns you the list of resources that
have been measured for a particular project, or compute some statistics on
metrics. Allowing such a large panel of possibilities and having such a
flexible data structure allows to do a lot of different things with
Ceilometer, as you can almost query the data in any mean you want.
The scalability issue
We soon started to encounter scalability issues in many of the read requests
made via the REST API. A lot of the requests requires the data storage to do
full scans of all the stored samples. Indeed, the fact that the API allows
you to filter on any fields and also on the free-form metadata (meaning non
indexed key/values tuples) has a terrible cost in terms of performance (as
pointed before, the metadata are attached to each sample generated by
Ceilometer and is stored as is). That basically means that the sample data
structure is stored in most drivers in just one table or collection, in
order to be able to scan them at once, and there's no good "perfect"
sharding solution, making data storage scalability painful.
It turns out that the Ceilometer REST API is unable to handle most of the
requests in a timely manner as most operations are O(n) where n is the
number of samples recorded (see
big O notation if you're
unfamiliar with it). That number of samples can grow very rapidly in an
environment of thousands of metered nodes and with a data retention of
several weeks. There is a few optimizations to make things smoother in
general cases fortunately, but as soon as you run specific queries, the API
gets barely usable.
During this last year, as the Ceilometer PTL, I discovered these issues
first hand since a lot of people were feeding me back with this kind of
testimony. We engaged several blueprints to improve the situation, but it
was soon clear to me that this was not going to be enough anyway.
Thinking outside the box
Unfortunately, the PTL job doesn't leave him enough time to work on the
actual code nor to play with anything new. I was coping with most of the
project bureaucracy and I wasn't able to work on any good solution to tackle
the issue at its root. Still, I had a few ideas that I wanted to try and as
soon as I stepped down from the PTL role, I stopped working on Ceilometer
itself to try something new and to think a bit outside the box.
When one takes a look at what have been brought recently in Ceilometer, they
can see the idea that Ceilometer actually needs to handle 2 types of data:
events and metrics.
Events are data generated when something happens: an instance start, a
volume is attached, or an HTTP request is sent to an REST API server. These
are events that Ceilometer needs to collect and store. Most OpenStack
components are able to send such events using the notification system built
into oslo.messaging.
Metrics is what Ceilometer needs to store but that is not necessarily tied
to an event. Think about an instance CPU usage, a router network bandwidth
usage, the number of images that Glance is storing for you, etc These are
not events, since nothing is happening. These are facts, states we need to
meter.
Computing statistics for billing or capacity planning requires both of these
data sources, but they should be distinct. Based on that assumption, and the
fact that Ceilometer was getting support for storing events, I started to
focus on getting the metric part right.
I had been a system administrator for a decade before jumping into OpenStack
development, so I know a thing or two on how monitoring is done in this
area, and what kind of technology operators rely on. I also know that
there's still no silver bullet this made it a good challenge.
The first thing that came to my mind was to use some kind of time-series
database, and export its access via a REST API as we do in all OpenStack
services. This should cover the metric storage pretty well.
Cooking Gnocchi
At the end of April 2014, this led met to start a new project code-named
Gnocchi. For the record, the name was picked after confusing so many times
the OpenStack Marconi project, reading OpenStack Macaroni instead. At least
one OpenStack project should have a "pasta" name, right?
The point of having a new project and not send patches on Ceilometer, was
that first I had no clue if it was going to make something that would be any
better, and second, being able to iterate more rapidly without being
strongly coupled with the release process.
The first prototype started around the following idea: what you want is to
meter things. That means storing a list of tuples of (timestamp, value) for
it. I've named these things "entities", as no assumption are made on what
they are. An entity can represent the temperature in a room or the CPU usage
of an instance. The service shouldn't care and should be agnostic in this
regard.
One feature that we discussed for several OpenStack summits in the
Ceilometer sessions, was the idea of doing aggregation. Meaning, aggregating
samples over a period of time to only store a smaller amount of them. These
are things that time-series format such as the
RRDtool have been doing for a long time on
the fly, and I decided it was a good trail to follow.
I assumed that this was going to be a requirement when storing metrics into
Gnocchi. The user would need to provide what kind of archiving it would
need: 1 second precision over a day, 1 hour precision over a year, or even
both.
The first driver written to achieve that and store those metrics inside
Gnocchi was based on whisper. Whisper
is the file format used to store metrics for the
Graphite project. For the actual storage,
the driver uses Swift, which has the advantages to be part of OpenStack and
scalable.
Storing metrics for each entities in a different whisper file and putting
them in Swift turned out to have a fantastic algorithm complexity: it was
O(1). Indeed, the complexity needed to store and retrieve metrics doesn't
depends on the number of metrics you have nor on the number of things you
are metering. Which is already a huge win compared to the current Ceilometer
collector design.
However, it turned out that whisper has a few limitations that I was
unable to circumvent in any manner. I needed to patch it to remove a lot of
its assumption about manipulating file, or that everything is relative to
now (time.time()). I've started to hack on that in my own fork, but then
everything broke. The whisper project code base is, well, not the state of
the art, and have 0 unit test. I was starring at a huge effort to transform
whisper into the time-series format I wanted, without being sure I wasn't
going to break everything (remember, no test coverage).
I decided to take a break and look into alternatives, and stumbled upon
Pandas, a data manipulation and statistics
library for Python. Turns out that Pandas support time-series natively, and
that it could do a lot of the smart computation needed in Gnocchi. I built a
new file format leveraging Pandas for computing the time-series and named it
carbonara (a wink to both the
Carbon project and pasta, how
clever!). The code is quite small (a third of whisper's, 200 SLOC vs 600
SLOC), does not have many of the whisper limitations and it has test
coverage. These Carbonara files are then, in the same fashion, stored into
Swift containers.
Anyway, Gnocchi storage driver system is designed in the same spirit that
the rest of OpenStack and Ceilometer storage driver system. It's a plug-in
system with an API, so anyone can write their own driver. Eoghan Glynn has
already started to write a InfluxDB driver, working
closely with the upstream developer of that database. Dina Belova started to
write an OpenTSDB driver. This helps to make sure
the API is designed directly in the right way.
Handling resources
Measuring individual entities is great and needed, but you also need to link
them with resources. When measuring the temperature and the number of a
people in a room, it is useful to link these 2 separate entities to a
resource, in that case the room, and give a name to these relations, so one
is able to identify what attribute of the resource is actually measured. It
is also important to provide the possibility to store attributes on these
resources, such as their owners, the time they started and ended their
existence, etc.
Once this list of resource is collected, the next step is to list and filter
them, based on any criteria. One might want to retrieve the list of
resources created last week or the list of instances hosted on a particular
node right now.
Resources also need to be specialized. Some resources have attributes that
must be stored in order for filtering to be useful. Think about an instance
name or a router network.
All of these requirements led to to the design of what's called the
indexer. The indexer is responsible for indexing entities, resources, and
link them together. The initial implementation is based on
SQLAlchemy and should be pretty efficient. It's
easy enough to index the most requested attributes (columns), and they are
also correctly typed.
We plan to establish a model for all known OpenStack resources (instances,
volumes, networks, ) to store and index them into the Gnocchi indexer in
order to request them in an efficient way from one place. The generic
resource class can be used to handle generic resources that are not tied to
OpenStack. It'd be up to the users to store extra attributes.
Dropping the free form metadata we used to have in Ceilometer makes sure
that querying the indexer is going to be efficient and scalable.
REST API
All of this is exported via a REST API that was partially designed and
documented in the
Gnocchi specification in the Ceilometer repository;
though the spec is not up-to-date yet. We plan to auto-generate the
documentation from the code as we are currently doing in Ceilometer.
The REST API is pretty easy to use, and you can use it to manipulate
entities and resources, and request the information back.
Roadmap & Ceilometer integration
All of this plan has been exposed and discussed with the Ceilometer team
during the last
OpenStack summit in Atlanta
in May 2014, for the Juno release. I led a session about this entire
concept, and convinced the team that using Gnocchi for our metric storage
would be a good approach to solve the Ceilometer collector scalability
issue.
It was decided to conduct this project experiment in parallel of the current
Ceilometer collector for the time being, and see where that would lead the
project to.
Early benchmarks
Some engineers from Mirantis did a few benchmarks around Ceilometer and also
against an early version of Gnocchi, and Dina Belova presented them to us
during the mid-cycle sprint we organized in Paris in early July.
The following graph sums up pretty well the current Ceilometer performance
issue. The more you feed it with metrics, the more slow it becomes.
For Gnocchi, while the numbers themselves are not fantastic, what is
interesting is that all the graphs below show that the performances are
stable without correlation with the number of resources, entities or
measures. This proves that, indeed, most of the code is built around a
complexity of O(1), and not O(n) anymore.
Next steps
While the Juno cycle is being wrapped-up for most projects, including
Ceilometer, Gnocchi development is still ongoing. Fortunately, the composite
architecture of Ceilometer allows a lot of its features to be replaced by
some other code dynamically. That, for example, enables Gnocchi to provides
a Ceilometer dispatcher plugin for its collector, without having to ship the
actual code in Ceilometer itself. That should help the development of
Gnocchi to not be slowed down by the release process for now.
The Ceilometer team aims to provide Gnocchi as a sort of technology preview
with the Juno release, allowing it to be deployed along and plugged with
Ceilometer. We'll discuss how to integrate it in the project in a more
permanent and strong manner probably during the
OpenStack Summit for Kilo
that will take place next November in Paris.
Just saw http://sny.no/2014/04/dbts and I feel compelled to note that distributed bug trackers are not new the earliest I personally encountered was Aaron Bentley s Bugs everywhere coming up on it s 10th birthday. BE meets many of the criteria in the dbts post I read earlier today, but it hasn t taken over the world and I think this is in large part due to the propogation nature of bugs being very different to code different solutions are needed.
XXXX: With distributed code versioning we often see people going to some effort to avoid conflicts semantic conflicts are common, and representation conflicts extremely common.The idions
Take for example https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/805661. Here we can look at the nature of the content:
Concurrent cannot-conflict content e.g. the discussion about the bug. In general everyone should have this in their local bug database as soon as possible, and anyone can write to it.
Observations of fact e.g. the code change that should fix the bug has landed in Ubuntu or Commit C should fix the bug .
Reports of symptoms e.g. Foo does not work for me in Ubuntu with package versions X, Y and Z .
Collaboratively edited metadata tags, title, description, and arguably even the fields like package, open/closed, importance.
Note that only one of these things the commit to fix the bug happens in the same code tree as the code; and that the commit fixing it may be delayed by many things before the fix is available to users. Also note that conceptually conflicts can happen in any of those fields except 1).
Anyhow my humble suggestion for tackling the conflicts angle is to treat all changes to a bug as events in a timeline e.g. adding a tag foo is an event to add foo , rather than an event setting the tags list to bar,foo then multiple editors adding foo do not conflict (or need special handling). Collaboratively edited fields would be likely be unsatisfying with this approach though last-writer-wins isn t a great story. OTOH the number of people that edit the collaborative fields on any given bug tend to be quite low so one could defer that to manual fixups.
Further, as a developer wanting local access to my bug database, syncing all of these things is appealing but if I m dealing with a million-bug bug database, I may actually need the ability to filter what I sync or do not sync with some care. Even if I want everything, query performance on such a database is crucial for usability (something git demonstrated convincingly in the VCS space).
Lastly, I don t think distributed bug tracking is needed it doesn t solve a deeply burning use case offline access would be a 90% solution for most people. What does need rethinking is the hugely manual process most bug systems use today. Making tools like whoopsie-daisy widely available is much more interesting (and that may require distributed underpinnings to work well and securely). Automatic collation of distinct reports and surfacing the most commonly experienced faults to developers offers a path to evidence based assessment of quality something I think we badly need.
Background
I ve previously written about the claim that people use Autism as an excuse for bad behavior [1]. In summary it doesn t and such claims instead lead to people not being assessed for Autism.
I ve also previously written about empathy and Autism in the context of discussions about conference sexual harassment [2]. The main point is that anyone who s going to blame empathy disorders for the widespread mistreatment of women in society and divert the subject from the actions of average men to men in minority groups isn t demonstrating empathy.
Discussions of the actions of average men are so often derailed to cover Autism that the Geek Feminism Wiki has a page about the issue of blaming Autism [3].
The Latest Issue
Last year Shanley Kane wrote an informative article for Medium titled What Can Men Do about the treatment of women in the IT industry [4]. It s a good article, I recommend reading it. As an aside @shanley s twitter feed is worth reading [5].
In response to Shanley s article Jeff Atwood wrote an article of the same title this year which covered lots of other things [6]. He writes about Autism but doesn t seem to realise that officially Asperger Syndrome is now Autism according to DSM-V (they decided that separate diagnosis of Autism, Asperger Syndrome, and PDD-NOS were too difficult and merged them). Asperger Syndrome is now a term that refers to historic issues (IE research that was published before DSM-V) and slang use.
Gender and the Autism Spectrum
Jeff claims that autism skews heavily towards males at a 4:1 ratio and cites the Epidemiology of Autism Wikipedia page as a reference. Firstly that page isn t a great reference, I fixed one major error (which was obviously wrong to anyone who knows anything about Autism and also contradicted the cited reference) in the first section while writing this post.
The Wikipedia page cites a PDF about the Epidemiology of Autism that claims the 4.3:1 ratio of boys to girls [7]. However that PDF is a summary of other articles and the one which originated the 4.3:1 claim is behind a paywall. One thing that is worth noting in the PDF is that the section containing the 4.3:1 claim also references claims about correlations between race and Autism and studies contradicting such claims it notes the possibility of ascertainment bias . I think that anyone who reads that section should immediately consider the possibility of ascertainment bias in regard to the gender ratio.
Most people who are diagnosed with Autism are diagnosed as children. An Autism diagnosis of a child is quite subjective, an important part is an IQ test (where the psychologist interprets the intent of the child in the many cases where answers aren t clear) to compare social skills with IQ. So whether a child is diagnosed is determined by the psychologist s impression of the child s IQ vs the impression of their social skills.
Whether a child is even taken for assessment depends on whether they act in a way that s considered to be obviously different. Any child who is suspected of being on the Autism Spectrum will be compared other children who have been diagnosed (IE mostly boys) and this will probably increase the probability that a boy will be assessed. So an Aspie girl might not be assessed because she acts like other Aspie girls not like the Aspie boys her parents and teachers have seen.
The way kids act is not solely determined by neuro-type. Our society expects and encourages boys to be louder than girls and take longer and more frequent turns to speak, this is so widespread that I don t think it s possible for parents to avoid it if their kids are exposed to the outside world. Because of this boys who would be diagnosed with Asperger Syndrome by DSM-IV tend to act in ways that are obviously different from other kids. While the combination of Autism and the the social expectations on girls tends to result in girls who are quiet, shy, and apologetic. The fact that girls are less obviously different and that their differences cause fewer difficulties for parents and teachers makes them less likely to be assessed. Note that the differences in behavior of boys and girls who have been diagnosed is noted by the professionals (and was discussed at a conference on AsperGirls that my wife attended) while the idea that this affects assessment rates is my theory.
Jeff also cites the book The Essential Difference: Male And Female Brains And The Truth About Autism by Professor Simon Baron-Cohen (who s (in)famous for his Extreme Male Brain theory). The first thing to note about the Extreme Male Brain theory are that it depends almost entirely on the 4.3:1 ratio of males to females on the Autism Spectrum (which is dubious as I noted above). The only other evidence in support of it is subjective studies of children which suffer from the same cultural issues this is why double blind tests should be used whenever possible. The book Delusions of Gender by Cordelia Fine [8] debunks Simon Baron-Cohen s work among other things. The look inside feature of the Amazon page for Delusions of Gender allows you to read about Simon Baron-Cohen s work [9].
Now even if the Extreme Male Brain theory had any merit it would be a really bad idea to cite it (or a book based on it) if you want to make things better for women in the IT industry. Cordelia s book debunks the science and also shows how such claims about supposed essential difference are taken as exclusionary.
The Problem with Jeff Atwood
Jeff suggests in his post that men should listen to women. Then he and his followers have a huge flame-war with many women over twitter during which which he tweeted Trying to diversify my follows by following any female voices that engaged me in a civil, constructive way recently . If you only listen to women who agree with you then that doesn t really count as listening to women. When you have a stated policy of only listening to women who agree then it seems to be more about limiting what women may feel free to say around you. The Geek Feminism wiki page about the Tone Argument [10] says the following:
One way in which the tone argument frequently manifests itself is as a call for civility. A way to gauge whether a request for civility is sincere or not is to ask whether the person asking for civility has more power along whatever axes are contextually relevant (see Intersectionality) than the person being called incivil , less power, or equal power. Often, people who have the privilege of being listened to and taken seriously level accusations of incivility as a silencing tactic, and label as incivil any speech or behavior that questions their privilege. For example, some men label any feminist thought or speech as hostile or impolite; there is no way for anybody to question male power or privilege without being called rude or aggressive. Likewise, some white people label any critical discussion of race, particularly when initiated by people of color, as incivil.
Writing about one topic is also a really good idea. A blog post titled What Can Men Do should be about things that men can do. Not about Autism, speculation about supposed inherent differences between men and women which are based on bad research, gender diversity in various occupations, etc. Following up a post on What Can Men Do with discussion (in blog comments and twitter) about what women should do before they are allowed to join the conversation is ridiculous. Jeff s blog post says that men should listen to women, excluding women based on the tone argument is gross hypocrisy.
Swearing
Jeff makes a big deal of the fact that Shanley uses some profane language in her tweets. This combines a couple of different ways of silencing women. It s quite common for women to be held to a high standard of ladylike behavior, while men get a free pass on doing the same thing. One example of this is the Geek Feminism article about the results of Sarah Sharp s request for civility in the Linux kernel community [11]. That s not an isolated incident, to the best of my recollection in 20+ years my local Linux Users Group has had only one debate about profanity on mailing lists in that case a woman (who is no longer active in the group) was criticised for using lesser profanity than men used both before and after with no comment (as an experiment I used some gratuitous profanity a couple of weeks later and no-one commented).
There is also a common difference in interpretation of expressions of emotion, when a woman seems angry then she invariably has men tell her to change her approach (even when there are obvious reasons for her anger) while when a man is angry the possibility that other people shouldn t make him angry will usually be considered.
The issues related to the treatment of women have had a large affect on Shanley s life and her friend s lives. It s quite understandable that she is angry about this. Her use of profanity in tweets seems appropriate to the situation.
Other Links
Newsweek s Gentlemen in Technology article has a section about Jeff [12], it s interesting to note his history of deleting tweets and editing his post. I presume he will change his post in response to mine and not make any note of the differences.
Jacob Kaplan-Moss wrote a good rebuttal to Jeff s post [13]. It s a good article and has some other relevant links that are worth reading.
My friend Aaron Swartz died a little more than a year ago. This time last year, I was spending much of my time speaking with journalists and reading what they were writing about Aaron.
Since the anniversary of his death, I have tried to take time to remember Aaron. I ve returned to the things I wrote and the things I said including this short article published last year in Red Pepper that SJ Klein and I wrote together but that I forgot to mention on my blog.
I m also excited to see that a documentary film about Aaron premiered at the Sundance Film Festival last week. I was interviewed for the film but am not in it.
As I said last year at a memorial for Aaron, I think about Aaron frequently and often think about my own decisions in terms of what Aaron would have done. I continued to be optimistic about the potential for Aaron-inspired action.
This post is just a bunch of random notes, but I think that things are going
well, this time.
Btrfs
Just started playing with btrfs after converting a 2TB disk
with about 1.3TB of data from ext4 to btrfs. This particular
filesystem contains my backup data and I hope that btrfs is able to live up
to high standards.
I will probably write a detailed review with my impressions, but suffice to
say that it is working fine and I hope that I will be happy for a long time
to come. My other filesystems, for the moment, will be ext4.
youtube-dl
Some people may have noticed, others may not, but when downloading
videos from Youtube, they apparently are getting more aggressive with the
use of Dynamic Adaptive Streaming over HTTP (also called
DASH) and, as a result, some (perhaps going to be all in the near future?)
of the videos may not be available in the resolution/formats that you used
to like (like me, with format number 35).
By the way, one thing that is interesting with youtube videos provided via
DASH is that they are available in different streams: one for the video and
another for the audio.
What does this mean in practical terms for users of youtube-dl? Well, if
you wanted to download videos in resolutions like the 480p (format 35) that
I mentioned, then you will probably have to change your way of doing things,
until a more automated solution is in place.
You will have to download both the audio and the video and, then, "combine"
them (that is, multiplex them) to create one "normal" video file with both
the audio and the video. I usually do this via:
ffmpeg -i audio.m4a -i video.mp4 -vcodec copy -acodec copy combined.mp4
If you prefer having a Matroska container instead of an mp4 container
(which, BTW, results in smaller muxing overhead), then you can use the
command line:
mkvmerge -o combined.mkv audio.m4a video.mp4
Oh, those m4a and mp4 extensions are a new addition that
I just sent upstream (in the past, both would have been named with an
extension of mp4).
As, an aside, I like formats 135 for video and 140 for audio, for the
reasons that I mentioned in a comments to issue 1612:
Otherwise, to download 480p videos (which I do for lectures and so on with
other projects of mine, like edx-dl) I have to call youtube-dl twice: once
for format 135 and another for format 140, since the old format 35
files are much smaller than the lower resolution 360p files (due to the
former being encoded in High profile vs. the latter being encoded in
Constrained Baseline profile).
While this is unfortunate for some, this is a good thing for others: I once
had a blind user of youtube-dl asking me if he could avoid downloading the
whole video just to extract the audio, so that he save on bandwidth. Well,
now this is possible.
Expect a new version of youtube-dl to be uploaded soon to Debian unstable.
Music
It is so nice to see the music that I like getting better. It was
particularly pleasant to seeTarja Turunen being joined
on stage by Floor Jansen.
In fact, it is my impression that the once female-fronted bands are getting
more and more into collaborations and side-projects and they are, in many
ways, getting more refined. Examples of this includes Floor, Tarja,
Anneke van Giersbergen, Sharon den Adel,
Kari Ruesl tten and many others that I can't remember right
now.
Of course, some thanks to people that take the time to film, and
upload/share the videos so that other people not at these events can watch
them is highly appreciated and they usually are treated as anonymous, but,
of course, this should be fixed.
I've just pushed out the release files for Obnam version
1.5, my backup application, and Larch,
my B-tree library, which Obnam uses.
They are available via my home page (http://liw.fi/). These
versions have alos been uploaded to Debian unstable.
NEWS for Obnam:
Terminal progress reporting now updated only every 0.1 seconds,
instead of 0.01 seconds, to reduce terminal emulator CPU usage.
Reported by Neal Becker.
Empty exclude patterns are ignored. Previously, a configuration file
line such as "exclude = foo, bar," (note trailing comma) would result
in an empty pattern, which would match everything, and therefore
nothing would be backed up. Reported by Sharon Kimble.
A FUSE plugin to access (read-only) data from the backup repository
has been added. Written by Valery Yundin.
NEWS for larch:
Bug fix in how Larch handles partly-comitted B-tree journals
in read-only mode. Previously, this would result in a crash
if, say, a node had been removed, but the B-tree metadata
hadn't been committed.
On Wednesday, I successfully defended my PhD dissertation in front of a ridiculously packed house at the MIT Media Lab. I am humbled by the support shown by the MIT Sloan, Media Lab, and Harvard communities. Earlier today, I finished up paperwork and submitted my archival copies. I m done.
Although I ve often heard PhDs described as emotional roller coasters, I feel enormously blessed in that I honestly can t relate. My eight years at MIT and Harvard have been almost universally positive and I have learned and grown indescribably. As excited as I am about my next chapter at the University of Washington, I m going to miss my life here. Deeply.
My dissertation was three essays on volunteer mobilization in peer production. Once I have a chance to catch up and recover, I ll be posting the previously unpublished pieces. The Remixing Dilemma was included in the dissertation and is already online. The Media Lab AV team shot professional video of the talk. When I get a copy of the video, I ll post that too.
But because I think it s important, I ve formatted and published the acknowledgments section of the dissertation today. Although there are too many folks to thank, I ve highlighted the contributions of my co-authors, and friends, Aaron Shaw and Andr s Monroy Hern ndez and my almost unbelievably incredible group of advisors: Eric von Hippel, Yochai Benkler, Mitch Resnick, and Tom Malone.